Some Statistical Characterisations of Terminological and Non-terminological Ele- Ments: Evaluation and Examination in Japanese Technical Abstracts

نویسندگان

  • Kyo Kageura
  • Keita Tsuji
  • Atsuhiro Takasu
چکیده

1 Introduction Corpus-based, quantitative approaches are important to the study of terminology , because terms are, unlike words, elements which can only be recognised at the level of language fact (Kageura 1995). Despite this, the only work which takes this approach is automatic term recognition Most of the simple and straightforward quantitative characterisations of terms have already been pursued in ATR work. All ATR methods perform reasonably well, but not completely satisfactorily. In addition, we do not know which method really is better. One reason for this is that ATR work has not clariied its real target. Is it attempting to recognise all the terms in a document, in a corpus, or in a eld, or a representative subset? Our standpoint is that the principal target of quantitative terminological study, either theoretical or practical, should be a terminology of at least one domain as a whole, and not individual terms or arbitrary set of terms appearing in a given corpus. In this paper, we initially consider the validity of some quantitative character-isations of terminological elements in Japanese technical abstracts, based on a few basic viewpoints used in existing ATR and automatic indexing work. Then we examine the meaning and theoretical position of this type of approach, and argue for the necessity of clearer and more integrated approach to the statistical modelling of terminology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Statistical Analysis of Morphemes in Japanese Terminology

In this paper I will report the result of a quantitative analysis of the dynamics of the consti tuent elements of Japanese terminology. In Japanese technical terms, the linguistic contribution of morphemes greatly differ according to their types of origin. To analyse this aspect, a quantitative method is applied, which can properly characterise the dynamic nature of morphemes in terminology on ...

متن کامل

Creating and Using Domain-specific Ontologies for Terminological Applications

Huge volumes of scientific databases and text collections are constantly becoming available, but their usefulness is at present hampered by their lack of uniformity and structure. There is therefore an overwhelming need for tools to facilitate the processing and discovery of technical terminology, in order to make processing of these resources more efficient. Both NLP and statistical techniques...

متن کامل

Lessons from students: A pilot project to discover guidelines for creating a student-friendly, relation-rich term bank

Since the 1990s, there has been growing interest in two key types of terminological information: terminological relations (including generic-specific and part-whole, as well as various non-hierarchical relations), and terminological contexts. These come together in knowledge-rich contexts (KRCs), which both illustrate terms’ behaviour in texts and reveal important connections between terms and ...

متن کامل

The Terminological Tools And Challenges Of Asian Languages Term Representation

Asian languages, such as Chinese, Japanese and Korean (CJK) have one thing in common: they exhibit different linguistics characteristics from English and other western languages. This poses special difficulties since current term representation techniques used in English cannot be applied directly to these languages. Instead, segmentation is a first process in CJK information processing. Segmen...

متن کامل

How to evaluate necessary cooperative systems of terminology building?

Terminology building cannot be considered as a full automated process but rather as a cooperative task between terminological tools and terminologists. Identifying terms in a technical domain is a matter of word usage and expert agreement. We point out the problem of the evaluation of such tools: their quality and their contribution to the terminology building is difficult to estimate and canno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007